Phoneme-level Indexing for Fast and Vocabulary-independent Voice/voice Retrieval
نویسندگان
چکیده
This paper reports explorations on a novel approach for speech information retrieval with spoken queries. The method uses a two-layer decoding scheme, where the intermediary representation of speech is based on phonemes, which makes the system vocabularyindependent. Moreover, the use of synchronized lattices at this intermediary level is shown to improve the discriminative performance while decreasing the size of the parameter space, and with a very reasonable additional computational cost.
منابع مشابه
A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech
For efficient organization of speech recordings – meetings, interviews, voice mails, and lectures – being able to search for spoken keywords is essential. Today, most spoken document retrieval systems use large-vocabulary recognition. For the above scenarios, such systems suffer from the unpredictable domain, out-ofvocabulary queries, and generally high word-error rate (WER). In [1], we present...
متن کاملSynthetic phoneme prototypes and dynamic voice source adaptation in speech recognition
A speech production oriented technique for generating reference spectral data for speech recognition is presented as an alternative to training to natural speech. The potentials of this approach are discussed. In the presented recognition system, the vocabulary and grammar are described as a finite-state network. Phoneme templates are specified in terms of control parameters to a cascade forman...
متن کاملSynthetic phoneme prototypes and source adaptation in a speech recognition system
A recognition system based on a reference library of synthetic phoneme prototypes is described. The phoneme templates are specified in terms of formant synthesis parameters. The vocabulary and grammar is described in a finite-state network where each state represents a phoneme. A transition between two phonemes in the net is expanded to a number of new states using interpolation on the synthesi...
متن کاملTeaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application
In addition to the increasing number of publicly available multimedia documents generated and searched every day, there is also a large corpora of personalized videos, images and spoken recordings, stored on users’ private devices and/or in their personal accounts in the cloud. Retrieving spoken items via voice commonly involves supervised indexing approaches such as large vocabulary speech rec...
متن کاملUsing Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کامل